Microsoft Word - final_turk

نویسندگان

S. A. Selouani

J. Caelen

چکیده

In this paper, we present an approach which significantly improves the performances of automatic speech recognition systems (ASRSs) dedicated to Arabic language. We propose to combine a version of Learning Vector Quantization (LVQ) and Time Delay Neural Networks (TDNNs) using an autoregressive version (AR) of the backpropagation algorithm. The underlying idea of this approach consists in the incorporation of AR-TDNNs in a hybrid structure in order to give the LVQ-based system the ability to overcome failures due to the language particularities such as emphasis, gemination and vowel lengthening. The test corpus is composed of subsets taken from an Arabic database. The results show that the proposed LVQ/AR-TDNN system achieves a highly recognition rate compared to the baseline LVQbased system. 1. PROBLEMATIC OF COMPLEX ARABIC PHONEMES RECOGNITION The present systems of automatic speech recognition (ASR) dedicated to Arabic remain confronted to the problems of the strong inflexion of the language. This syntactic particularity is complicated by the poverty of the Arabic vocalic system which is partially compensated by the semantic relevance of the vowels lengthening. In the consonantal system, another phonetic complexity resides in the presence of features as subtle as emphasis and gemination [1]. Unfortunately, the developed ASR systems do not take into account these phonetic properties in order to limit the drop of their performances. This has as a consequence the quasi absence of any commercial product dedicated to Arabic language while we are observing a boom of which is actually called ‘language industries’. Therefore, in the case of an emphatic vs. non-emphatic opposition, an efficient ASR system must be capable to distinguish, for example, between the two words: /sa:ra/ (to walk) and /sa:ra/ (to become), where an emphasis is observed over /s/ fricative. The present ASR systems cannot easily raise this ambiguity. In the following example illustrating the gemination case, we require the ASR to discriminate between the two words: /nafaδa/ (to escape) and /naf:aδa/ (to execute), where the /f/ fricative is geminated. A similar problem is encountered in the vocalic system. For instance, the two words: /suru:ru/ (happiness) and /sururu/ (umbilical cord) differ only by the lengthening of the second vowel /u/. We require the recognition system to detect this vowel without altering its temporal property. It is proposed to be done by an original combination of Waibel’s TDNN [12] and a modified version of the Kohonen’s LVQ algorithm [7]. 2. AUTOREGRESSIVE TIME DELAY NEURAL NETWORKS (AR-TDNN) Contrarily to feedforward networks, recurrent networks are generally trickier to work with, but they are theoretically more powerful, having the ability to represent temporal sequences of unbounded length. Because speech is a temporarily unstable phenomenon, we consider recurrent networks to be more adequate than feedforward networks. Another consideration related to phonetic context influence leads us to use an autoregressive version of backpropagation algorithm (AR-back propagation) proposed by Russel [9]. This type of networks can in principle captures naturally the co-articulation phenomenon of speech. Some studies show that they are very performing in the context-dependent labeling. However, this power turns out to be source of disappointment in the case of phoneme time shifting. The approach we are investigating proposes to integrate in addition to the AR component, a delay component similar to the one used by Waibel’s TDNN [12]. Through this combination, we expect that the ability of the system to discern the phonological length even in a strong coarticulation context will be increased. The model described by Russel et al includes an autoregressive memory which constitutes a form of self-feedback where the output depends on the current output plus a weighted sum of previous outputs. Then, the classical AR node equation is given by: yi(t) is the output of node i at time t. f(x) is the tanh(x) bipolar activation function, P is the number of input units. M is the order of autoregressive prediction. Weights wi,j biases and coefficients ai,n are adaptive and are optimized in order to minimize the output error. Our proposition consists in incorporating a time delay component on the inputs nodes of each layer and then equation (1) becomes: Where L is the delay order at the input. ) ( ) ( ) ( 1 , 1 , n t y a t x w bias f t y i M

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving Fuzzy Partial Differential Equation by Differential Transformation Method

Normal 0 false false false ...

متن کامل

Accessible Instruction - Resources

Introduction to Accessible Education [2] Developing Courses [3] Writing a Course Syllabus [4] Creating Accessible Lectures [5] Using PowerPoint [6] Using Word Documents and/or PDFs [7] Microsoft Word Accessibility Video pt 1 [8] Microsoft Word Accessibility Video pt 2 [9] Evaluating Students and Giving Feedback [10] Using Microsoft Office Microsoft Office 2010 Accessibility Video [11] Microsoft...

متن کامل

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Microsoft Word - final_turk

نویسندگان

چکیده

منابع مشابه

Solving Fuzzy Partial Differential Equation by Differential Transformation Method

Accessible Instruction - Resources

Accessible Instruction - Resources

Accessible Instruction - Resources

Accessible Instruction - Resources

Accessible Instruction - Resources

عنوان ژورنال:

اشتراک گذاری